ON CONVEX FUNCTIONS AND THE FINITE ELEMENT METHOD 

NESTOR E. AGUILERAt AND PEDRO MORIN* 

Abstract. Many problems of theoretical and practical interest involve finding a convex or 
concave function. For instance, optimization problems such as finding the projection on the convex 
functions in H (fl), or some problems in economics. 

In the continuous setting and assuming smoothness, the convexity constraints may be given lo- 
cally by asking the Hessian matrix to be positive semidefinite, but in making discrete approximations 
two difficulties arise: the continuous solutions may be not smooth, and an adequate discrete version 
of the Hessian must be given. 

In this paper we propose a finite element description of the Hessian, and prove convergence under 
very general conditions, even when the continuous solution is not smooth, working on any dimension, 
and requiring a linear number of constraints in the number of nodes. 
^^^ Using semidefinite programming codes, we show concrete examples of approximations to opti- 

^*-*». mization problems. 

O 

^v.1 Key words. Finite element method, optimization problems, convex functions, adaptive meshes. 






< 



O 



4^ 



s 



AMS subject classifications. 65K10, 65N30. 



1. Introduction. Convex and concave functions appear naturally in many disci- 
plines of science such as physics, biology, medicine, or economics, and they constitute 
an important part of mathematics, naturally putting forth the question of how these 
I— I functions can be approximated numerically. 

"^ Particularly interesting instances are optimization problems where the feasible 

^^ solutions are a family of convex functions. For example, let H^{Q) denote the usual 

Sobolev space of L'^{Q) functions having all weak derivatives of order up to k in 
1/^(1]), and suppose 1] C M^ is a convex domain. We may be interested in finding the 
2 projection of a given / G H^{ft) onto the set C of convex functions in H^{Q)^ 



min||w-/||^,. (1.1) 



aec 



^ Or, given f ^ H ^(^), we may be interested in minimizing the Dirichlet func- 

jr^ tional 
OO 

^ Jf{^) = l /|grad^x)pdx + (/,ii), (1.2) 

^^ over the set of convex functions u defined in Q with J^udx = 0. 

^^ Often the convexity requirement in applications comes from a reasonable shape 

:• assumption on the model, which could be replaced by or added to other shape con- 

.(^ straints such as radial symmetry, harmonicity or upper and lower bounds. This is the 

S^ case, for instance, of Newton's problem of minimal resistance [4, 5, 12, 13]. 

H More surprisingly perhaps, the convexity may be a consequence of the model, as in 
some mechanism design problems in economics. For example. Rochet and Chone [17] 
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and Manelli and Vincent [14] (among others) study what we wih cah the monopolist 
problem^ in which the functional to be maximized is the seller's expected revenue, 

max / (^gTdidu{x) ' X — u{x) — c\gTdidu\ ) /(x) dx, (1.3) 

'"^^ Jq 

where Q = [0, 1]^ is the d-dimensional unit cube, c is a non-negative constant, / is 
a probability density function on Q, and C is the set of convex functions u satisfying 
u{0) = and < giadu < 1 (component- wise). In this problem, the convexity restric- 
tion comes from the requirement of incentive compatibility. We refer the interested 
reader to [14] and the references therein for further details on the model. 

From a theoretical point of view, Carlier and Lachand- Robert [6] obtained the C^ 
regularity of a variant of the monopolist problem (1.3), under some restrictions on the 
domain Q and the density /. They obtained also C^ regularity for convex minimizers 
of functionals similar to that in (1.2), with the condition J^udx = substituted for 
u = uo in d^. 

From a numerical point of view, Carlier, Lachand- Robert and Maury [7] proposed 
a finite element scheme for minimizers in H^{Q) or L'^{Q) of functionals encompass- 
ing (1.1), (1.2) and (1.3). In a two dimensional setting, they consider finite element 
functions which are the interpolants of convex functions, and show that this definition 
is equivalent to an intrinsic one, stated only in terms of the value of the function at 
the grid points. In a way, they consider (weak) second order pure derivatives in every 
possible direction allowed by the underlying grid. The problem with this description 
is that it is non-local, and the number of constraints needed in two dimensions (after 
pruning) reportedly grows approximately as A^^-^, where N is the number of nodes 
in the grid. Moreover, their approach is very difficult to extend in practice to higher 
dimensions. 

The work of Carlier et al. [7] includes the problem of finding 

min / \u — f\ dx subject to u e I/^(^), u convex, u < f^ 
Jn 

for given / G L^(r^), i.e., a L^-norm projection, and — as they point out — this problem 
is equivalent to that of finding the convex envelope /** of /. Thus, minimizing over 
convex functions and finding the convex hull of the epigraph of a function are two 
quite related tasks. 

Being a central problems in computational geometry, there are a number of well 
established codes for finding the convex hull of a set of points in R^, which are 
very efficient in low dimensions. Hence, it is natural to try to use these codes to 
approximate convex functions, an approach which Lachand- Robert and Oudet [11] 
applied to several problems. 

There is a large literature on convex functions in a continuous setting, well rep- 
resented by the book by Rockafellar [18]. Also the discrete mathematics community 
has produced quite a few definitions for convexity of functions defined on lattices (see, 
e.g., the article by Murota and Shioura [16] and the references therein). But in either 
case, the definitions are usually of a non-local nature. 

One of the main difficulties in obtaining discrete approximations to convex func- 
tions in dimensions higher than one, lies in giving a local and finite description of 
them. Though this could be done for smooth functions of continuous variables by 
asking the Hessian matrix to be positive semidefinite at all points, we know of no 
similar characterizations for finite element functions on meshes. 



This article builds on our previous work [1], where we gave a theoretical frame- 
work for approxiniating convex functions using a finite difference discretization of the 
Hessian matrix and semidefinite programs. 

Let us recall that a semidefinite program is an optimization problem of the form 

min c • X 
subject to Q ^\ 

XiAi H h XnAn - ^0 ^ 0, 

X gM^, 

where c G M^, Aq, Ai, . . . , A^ are symmetric m x m matrices, and A y indicates 
that the symmetric matrix A is positive semidefinite. By letting the matrices Ai be 
diagonal, we see that the program (1.4) is a generalization of linear programming (and 
includes it strictly). Thus, in a semidefinite program the constraints can be a mixture 
of linear inequalities and positive semidefinite requirements. We refer the reader to the 
article by Vandenberghe and Boyd [20] for further properties of semidefinite programs. 

In this article we carry over the framework in [1] into the approximation with finite 
elements. We do so through a weak definition of a finite element (FE) Hessian and 
corresponding definition of FE-convex functions.* Being the definition very natural 
and straightforward, the main goal of this article is to provide a solid theoretical 
foundation of this approach and to illustrate its applicability to a broad range of 
models. 

In contrast to finite differences, it is now very easy to adaptively refine the meshes 
and reduce drastically the computational effort, especially taking into account the 
fact that the time needed by the semidefinite programs is more than quadratic on the 
number of nodes. 

Although not linear, our approach seems very natural and has many advantages. 
Being of a local nature, the number of constraints grows only linearly with the number 
of nodes, and it works for any dimension of the underlying space. 

The rest of this paper is organized as follows. 

In section 2 we introduce the FE-Hessian and FE-convex functions, discussing 
several related issues. We give examples and counter-examples showing how FE- 
convex functions relate to usual convex functions and the finite element version given 
by Carlier et al. [7]. 

In section 3 we prove the main results of the paper. We show that, under ap- 
propriate assumptions and norm, every convex function, even if not smooth, can be 
approximated by a sequence of FE-convex functions, and that the limit of every con- 
vergent sequence of FE-convex functions (with space discretization parameter going 
to zero) is a convex function. We also show some compactness results, such as that a 
(norm) bounded sequence of FE-convex functions has a convergent subsequence (to a 
convex function). 

In section 4 we show how the previous results may be used to approximate many 
optimization problems, providing a general framework for the numerical treatment 
of optimization problems over convex functions, and prove some theoretical results 
supporting the potential applicability to a broad range of concrete problems. We 
do not focus here on a specific problem, and thus our convergence results will not 
provide convergence rates, since these depend on the regularity properties of the 
exact solutions, and other features of the particular problem at hand. 



*We denote them with the prefix "FE" to distinguish them from other definitions, such as those 
in [1, 7, 16]. 





(a) Regular diagonal mesh. (b) Interpolant of {xi + X2)^. 

Fig. 2.1. Interpolant of a convex quadratic function on a regular mesh. 



In section 5 we discuss the actual numerical implementation, and give concrete 
examples of the monopolist problem (1.3), and the Dirichlet integral (1.2). 
We conclude by summing up and commenting on the results we found. 

2. Discrete Hessians and discrete convexity. There are two main issues 
when defining the set of discrete approximants to be used: 

1. it must be rich enough to approximate every convex function, and 

2. it must be not too large, to avoid convergence to non-convex functions. 
The first point is very natural, and necessary to be able to approximate the 

solution of the problem. The second point looks artificial at first sight, but if it did 
not hold, a sequence of functions in this kind of sets could converge to a non-convex 
function. 

On one hand, as noted by P. Chone in his Ph.D. Thesis [8], the affine Lagrange 
interpolant of a convex function need not be convex. Consider for instance a regular 
diagonal mesh as shown to the left in figure 2.1, and suppose Q = (—1,1) x ( — 1,1). 
The interpolant on this mesh of the quadratic convex function {xi + ^2)^, shown to 
the right of that figure, is clearly not convex. 

On the other hand, if we consider a convergent sequence of convex piecewise linear 
functions on a sequence of meshes like those of figure 2.1, with mesh size tending to 
zero, then the limit will satisfy 



dh 



9xi9x2 



<0. 



This is a consequence of proposition 1 in [7] , which was first proved in [8] . It clearly 
indicates that not all convex functions can be approximated by discrete functions 
that are convex in the usual sense, i.e., the definition of the discrete approximants to 
convex functions needs to be more involved. 

In order to proceed, we briefly review some concepts and set some notation. If 
1] is a bounded open convex set in R^ {d > 2), u e (7^(1^) and x = (xi, . . . ^Xd) G ^, 



the Hessian Hu{x) G 



pdxd 



is defined as the matrix whose ij entry is the second order 



partial derivative of u in the directions Xi and Xj , 

{Hu{x)).. = diju{x). 

As is well known, u G C'^{Q) is convex if and only if Hu is positive semidefinite 
everywhere in Q^ in symbols 

Hu{x) >z for ah x e Q. 
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When u is not smooth enough, we may nevertheless consider the Hessian in the 
distribution sense. In other words, we may define Hu as a matrix of distributions 
such that for every cp G Cq^{Q)^ {Hu^ cp) is the matrix of numbers 

{Hu,^) = {u,Hy,). (2.1) 

In 1971, Dudley [9] proved that if i^ is a distribution on Q such that 

{Hu, (f) hO for ah (p e C^{n), (f > 0, (2.2) 

then u belongs to the (Lebesgue) equivalence class of a continuous convex function in 
Q. Conversely, if li is a convex continuous function then (2.2) holds (in the distribution 
sense). 

By allowing some smoothness on u, say u G H^{Q), we may rewrite (2.1) and 
interpret Hu as a matrix of distributions [{Hu)ij] satisfying 

{{Hu)ij,cp) = {Hu, cp)ij = - / diu{x) djcp{x) dx, for ah cp e C^{ft). (2.3) 

In this case, the equality in (2.3) also holds for all cp G Hq{Q), and Dudley's results 
imply that given u G H^{Q), li is a continuous convex function in Q if and only if 

HyUhO for eyeiy V e H^{n),v>0, (2.4) 

where for convenience, for u G H^{Q) and v G Hq{Q), we have denoted by HyU the 
matrix whose ij entry is 

{Hyu)ij = -{diU,djv). 

It is then natural to define a discrete Hessian in the finite element setting along 
these lines. To do so, it will be convenient to use two different families of finite 
element basis functions. The first one, {(/>^} indexed by r G /t^i^i' ^^^^ ^^ used for 
approximations, and the second one, {(Ps} indexed by 5 G ^t^gt, will be used as test 
functions, and we will assume that 

(p^{x) > for aWxeft and all s G //^est- (2-5) 

Vh and Wh will denote the (real) linear spaces spanned by {^^} and {(p^}, respec- 
tively, and again for simplicity we will assume Vh C H^{Q) and Wh C Hq{Q). h will 
denote, as usual, a discretization parameter, equivalent to the maximum diameter of 
the elements of the underlying grid. 

For u G Vh and each s G ^t^st^ ^^ define the FE-Hessian (of u with respect to cps), 
H^u, by 

HgU = H^hU, 
and in particular, if u = (j)^, we define 



so that 






We are now in a position to state the following 

Definition 2.1. We will say that u e Vh is FE-convex (with respect to {0^} 
and {(f^}) if 



H^u h for all s e I^, 



test' 



If u eVh is convex in the usual sense, and the conditions (2.5) hold, by Dudley's 
results in the form (2.4), H^u >z for all s G //^st- Therefore, convex functions in Vh 
are FE-convex. 

As was shown in the example of figure 2.1, the interpolant of a continuous con- 
vex function need not be convex. However, as we will see next, in that example 
the interpolant is FE-convex, and therefore in general FE-convexity does not imply 
convexity. 

Example 2.2. Let us consider a regular diagonal mesh in 1] = (0^ 1) x (0? 1)^ ^s 
shown in figure 2.1, let h be the length of the shorter sides of the triangles, and let 
Vh and Wh consist of piecewise linear functions. 

A simple calculation shows that, if Uh G Vh and (p^ G Wh is the function which 
equals 1 on the interior node with coordinates (a, b) and vanishes in the other mesh 
nodes, then 



H^h 



a (3 
(i 7 



where 



a = Uh{a - h,b) ^ Uh{a ^h,h) - 2uh{a, 6), 

(3 = - \2uh{a,h) ^ Uh{a - h,h - h) ^ Uh{a ^ h,h ^ h) 

- {uh{a, h - h) ^ Uh{a,h ^ h) ^ Uh{a - h,h) ^ Uh{a + h, b)) j , 
7 = Uh{a, h-h) ^ Uh{a, b ^ h) - 2uh{a, b). 

If Uh is the Lagrange interpolant of the quadratic function u{xi^X2) = {xi -\-X2 — 
1)^, another simple calculation shows that 

SO that Uh is FE-convex but it is not convex as we know from figure 2.1. <) 

It is worth noticing that, in general, it is not true that the interpolant of a convex 
function is FE-convex, even for some highly regular meshes. In order to illustrate 
this, we have sketched some common patterns of regular meshes in figure 2.2. It can 
be readily seen that the example 2.2 for the diagonal mesh in figure 2.1(a) can be 
carried over to the "chevron" or "alternating" mesh. However, the behavior is quite 
different for the "crisscross" and "Union Jack" patterns. 

Example 2.3. Consider as in the previous example Q = (0, 1) x (0, 1), but now 
a regular "Union Jack" mesh in Q as shown in figure 2.2(c). As before, let h be the 
length of the shorter sides of the triangles, and let Vh and Wh consist of piecewise 
linear functions. 

In this mesh, the nodes inside Q can have either 8 or 4 neighbors. If (a, b) is a 
mesh node having 8 neighbors and cps is the corresponding nodal basis function, for 

6 
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(a) Chevron or alternating. 



(b) Crisscross. 



(c) Union Jack 



Fig. 2.2. Other common patterns for regular meshes. 



Uh G Vh we now have 



H%h 



a /3 

/? 7 



where 



that 



a = Uh{a - h,b) ^ Uh{a -^ h,b) - 2uh{a, h), 
/3 = - {uh{a -^ h,b^ h) ^ Uh{a - h,b- h) 

— Uh{a -\- h^b — h) — Uh{a — h^b -\- h)) , 
7 = Uh{a, b-h) ^ Uh{a, b ^ h) - 2uh{a, b). 

If Uh is the interpolant of the quadratic function u{xi^X2) = {xi + ^2)^, we see 



HSh 



2 4 
4 2 



^0, 



so that Uh is not FE-convex. 

Similar examples can be constructed for the "crisscross" meshes since these are 
essentially 45° rotations of "Union Jack" meshes. 

We conclude from these examples that our concept of FE-convex functions neither 
contains nor is contained in that of [7]. However, as we show in the next section, these 
concepts have many common features. 

3. Limits of FE-convex functions. Having defined FE-convexity through a 
discrete Hessian, we would like to see how these concepts may be used to approximate 
convex functions. 

The first decision we have to make is in what sense the approximation will be 
done. By the very definition of the FE-Hessian in (2.3), it is natural to consider the 
approximation in the H^ sense, and this is what we will do, but of course other spaces 
could be used. In particular, even when working with approximations in H^{Q)^ we 
will make use of the spaces W^'^(l]) consisting of the functions having at least k weak 
derivatives in L^i^Q). 

On the other hand, it will be convenient to work with a sequence of finer and finer 
meshes, A^^'^, with /i^ | as n ^ 00. However, so as not to clutter the notation, we 
will drop the index n, and we will also assume < h < 1. 

We are confronted now with several tasks: 



1. Suppose {uh)h is a sequence, /i | 0, of FE-convex functions Uh G V^, and 
suppose that, as /i | 0, Uh has a H^{Q) weak hmit u G J^^(r^). Is li convex? 

2. Given a bounded sequence {uh)^ with Uh FE-convex in V^. Is there a con- 
vergent subsequence? 

3. Recahing Chone's observations and the example in figure 2.1, can any convex 
function in H^{fl) be approximated as much as desired (in that space) by 
FE-convex functions (for appropriate Vh and W/^)? 

The first two issues wih be covered by theorem 3.3 and corohary 3.4. The last 
issue will be covered by theorem 3.6, but it is somewhat different in fiavor from the 
previous results, since we will need some properties of the approximating spaces Vh. 
These will depend on the choice of the finite element families, which can vary widely. 

Since it is not our purpose in this paper to present the results under the most gen- 
eral conditions, for simplicity we will restrict our attention to C^ Lagrange elements, 
hoping that the interested reader will be able to adapt the proof of theorem 3.6 to 
other families. 

We start by observing that if Q is any convex domain, then the convex functions 
in H^{Q) may be approximated by smooth convex functions: 

Lemma 3.1. Suppose Q is a bounded convex open subset ofR^. If u is a convex 
function in H^{ft), for any e > there exists a convex v G (7^(1^) such that 

Proof Let (p G Cq^, (f >0^ with support inside {x G M^ : \x\ < 1} and J cpdx = 1, 
and for J > consider the mollifier 

us{x) =u^ (ps{x) = ^ / u{y) (p{S~^ {x - y)) dx, 

which is well defined in 

9.8 = {x eVt: dist(x, dVt) > S}, 

where dist(x, A) is the distance from x to the set A. 

As is well known, us G (7^(1^5), and since u is convex and us is an average with 
non- negative weights, us is convex in Qs- For S' > fixed, us converges to u in 
H^{9s') as 5 I 0, and moreover us G C^ {fts') if ^ < S\ Thus the result is true for 
Qs' foi" every S' > 0. 

Given its geometry, it is easy to obtain the result for Q by using suitable dilations. 
For instance, pick xq G Q and consider for 1 < A < 2, the set Q^ = {xq -\- X{x — xq) : 
X G 1^}, and for u defined on ft consider u^ defined on 9^ by u^{xo-\-X (x—Xq)) = u{x). 
Then 9^ is convex, \\u^ — u\\jji(^^^ -^ as A | 1, and if u is convex in ft then u^ is 
convex in l]'^. 

Thus, fixing first A > 1 so that \\u^ - u\\^i^^^ < e/2, then S' > so that {9^)s^ D 
(^, and finally ^ > so that \\u^ — i^^) s\\ h^ ( (n^) ,) ^ ^/^' ^^ ^sive 

<£/2 + ||M^-(u^),||^,((^,)^,)<e. 



From now on we will assume that we have a sequence of meshes At^, with /i j 0, 
each consisting of a family T^ of non-overlapping closed d-dimensional simplices such 
that for each h^ Q = Urp^j-hT. This implies that Q is the interior of a polyhedron 
(intersection of finitely many half-spaces) . Recalling that we are thinking of a hierar- 
chical sequence of meshes and we require the non- negativity condition (2.5), we now 
make some additional assumptions on the meshes and discrete spaces we will work 
with, following chapter 4 of the book by Brenner and Scott [3]. 

Assumptions 3.2. We will denote by \A\ the Lebesgue measure of A, and indicate 
by a c^ b that for some positive constants Ci and C2 (independent of h and u) we have 
Cia<b< C2a, 

1. For allO<h< h' , Vh' C F/, C H^{Q) and Wh' C 1^/, C HI{Q). 

2. There exists a linear operator X^ with values in Vh (the interpolant^^ an 
integer m>2, and a constant C independent of u and h, such that 



\u-l''u\\„,,c.,<CK 



m—1 



U\ 



In particular, U/^V^ is dense in H^{Q). 

3. Condition (2.5) holds, i.e. 

(p^{x) > for all X eQ, all h > 0, and all s G 1^^^^. 

4. Given Hq and sq G I test ^ /^^ every h < Hq there exist coefficients a^ > such 



that 



Yl ^- 



^so= / . ^s^ 



5. Given cp G Cq^{Q), (p > 0, there exists a sequence {wh)h converging to if> in 
Hl{Q) such that 



Wh= Yl ^^s^s^ ^^' 



with a^ > for all h in the sequence and s G I test- 
In particular, ^JhWh is dense in HKP). 
6. For all h, 



/. 

Jn 



\T\ c^h"^ forallTeT^, 

(f^ dx c^ h"^ and |grad(/?^| < C/h for all s G Itest- 



These conditions will be satisfied for quasi-uniform families, taking C^ Lagrange 
elements with polynomials of degree less than m for the trial space V^, and piecewise 
linear elements for the test space Wh- The assumptions also hold choosing Wh as the 
finite element space of continuous piecewise polynomial functions of any fixed degree. 
Assumption 3.2.5 is guaranteed because Wh will always contain the piecewise linear 
finite element functions. The only detail to take into account is the choice of the basis 
functions in order to fulfill assumption 3.2.3. If the degree is bigger than one, we 
cannot use as (p^ the canonical nodal basis functions of Wh, because some of them 
change sign. The construction is still possible though (see section 5.1 for details). 
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The following is one of the main results of the paper. 

Theorem 3.3. Let {uh)^^ be a sequence converging weakly in H^{Q) to u as h I 0, 
such that for each h, Uh G Vh and 

H^Uh h for all s e 1^^,^. 

Then u is convex. 

Proof. By assumption 3.2.4, for arbitrary but fixed Hq and Sq G /test' ^^^ ^^y 
h < ho we may find coefficients a^ > so that (ps^ = ^^ a^ Lpg- 

Therefore, since H^Uh h for every s G //^st^ setting w = cpso we obtain 



H^Uh = XI ^^ ^s^h ^0 for /i < ho. 



Given that u^ converges weakly in H^^ft) to u^ we now have 

H^u = lim H^Uh h 0, 

and finally, by assumption 3.2.5, H^u >z for all non-negative (p G Co^(l]), and the 
theorem follows by Dudley's results (2.2).D 

Since the unit ball in H^{Q) is weakly compact, we have: 

Corollary 3.4. Let (i^^)^ be a bounded sequence in H^{Q) such that for every 
h the function Uh G Vh is FE- convex. Then there exists a subsequence that converges 
weakly in H^(Q) to some function u G H^{Q), and this function is necessarily convex. 

This theorem and its corollary answer the first two issued raised at the beginning 
of this section. In order to proceed further, and answer the last one, we will need the 
following result by Hoffman and Wielandt [10]: 

Theorem 3.5. There exists a positive constant Cd, depending only on the dimen- 
sion d, such that if A = [aij] and B — \bi^\ are symmetric d x d matrices, and A and 
ji are their minimum eigenvalues, then 



|A — /i| < Cd max |a^j — h. 



^j\ 



The following is the second main result and responds the third issue raised at the 
beginning of this section. 

Theorem 3.6. Ifm>2, given u G H^{Q), u convex and e > {), there exist h > 
and Uh G Vh such that 

and 

H^iuh) h for all s € I^st- 



Proof. By lemma 3.1, it will be enough to assume that i^ is a C^(yt) convex 
function. 

In the sequel we will denote by C, C or C" ^ positive constants which may vary 
from one occurrence to another, even in the same line, which may depend on u (which 
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we consider fixed from now on), ft, the dimension d and the regularity degree m, but 
are independent of h. For instance, we write 

ll'^ll//'^(Q) = ^' 
Let us consider the auxihary function 

g{x) = -|x|^ 

which is a convex C^(M^) function. The regular Hessian of ^, Hg^ is 

Hg = Id = identity matrix in R"^^"^, 
and therefore, for any w G Hq{Q)^ 

HwQ = {Hg, w) = / Idwdx= { wdx) Id, 
Jn Jn 

and in particular, 

Hs9 ={J^^s d^) ^d^ fo^ ah h and ah 5 G itst- (3.1) 

For 5 and h positive and small, let 

V = u-\- Sg and Uh = T^v, 

where X^ denotes the interpolant considered in assumption 3.2.2. We notice that since 
the third derivatives of g vanish, h is bounded and tti > 2, 

\\^''9\\HHn)<C\\g\\Hr^rn)=C, 



and 



1^ - ^hWH^n) = \\u-I^u- Sl^gW^,^^^ <\\u- I^u\\^,^^^ + W^^^gWm 



m 



Thus, if h-\- S is small enough (depending on e), we obtain the first inequality of 
the theorem. 

In order to see that H^Uh h 0, we first look at the eigenvalues of H^v for s G /t^gf 
If C ^ ^^5 using that u is convex and smooth, equation (3.1), and the bounds in 
assumption 3.2.6, we obtain 



(iJ> C) ■ C = {{Hu, v^) C)-C + 5 {{Hg, v^^) C) • C 



>0 + (5|C|^ / v's dx > C(5 ICI^ /i^ 



and therefore the eigenvalues of H^v are bounded below by 



CSh". (3.2) 
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In order to compare the entries of H^v and H^Uh, we use assumptions 3.2.6 
and 3.2.2 to obtain 

Thus we may use theorem 3.5 and the bounds (3.2) and (3.3) to obtain that the 
eigenvalues of H^u^ are bounded below by 

CSh"^ - C'/i^+^-2 = Ch"^ {5 - /i^-2). 

The theorem follows now by taking 8 > Ch^~'^ if m > 2, and at the same time 
S -\- h small. D 

The condition tti > 2 in theorem 3.6 implies that the functions in Vh will have 
to be piecewise polynomials of degree at least 2, meaning that the result may not 
hold for linear finite elements. This does not seem quite satisfactory, and we need to 
elaborate on the necessity of this condition. 

As we have seen in example 2.2, for meshes such as those of figure 2.1(a) or 
figure 2.2(a), if we use piecewise linear functions for the space Vh {m = 2), the 
discrete Hessian becomes a finite difference scheme (except for a factor of h'^) which 
is exact for quadratic functions. The results presented in [1] can be adapted to show 
that in this case theorem 3.6 also holds for ?7i = 2. 

On the other hand, as we have seen in example 2.3, the discrete Hessian with 
piecewise linear functions for very regular meshes such as those in figure 2.2(b) or (c) 
is not exact for quadratic functions, which means that we may not be able to get good 
approximations. In the following example we report numerical evidence supporting 
the necessity of assuming ?n > 2 in theorem 3.6. 

Example 3.7. We compute the i^^((0, 1) x (0, l))-projection of the smooth convex 
function 

u{xi^X2) = {x2 — 0.5xi — 0.25)^, 

onto the set of continuous piecewise linear functions that are FE-convex over criss- 
cross meshes, as those obtained using longest-edge or newest-vertex bisection. Since 
u is convex, the projections u^ should converge to i^ as /i ^ 0, but this is not the case 
in this example. 

In figure 3.1 we can observe a sequence of meshes and the level curves of the 
L^-projection of u into the set of FE-convex piecewise linear functions on each mesh. 
The level curves of the exact function u, which is convex, are straight lines parallel 
to the line X2 = 0.5xi. Nevertheless, the level curves of the approximants converge 
to ellipses which do not straighten up by refinement. This is a clear indication that 
(uh) does not converge to u diS h ^ 0. We remark that the same behavior is observed 
when projecting in H^, L^ and L^, and even when imposing boundary values. 

Summing up: although there is some sort of super-convergence for some meshes, 
for general meshes — even highly regular — FE-convex piecewise linear functions may 
not suffice to approximate convex functions. 
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Fig. 3.1. Example showing that convergence cannot be guaranteed when using piecewise linear 
functions. Meshes (top), and L"^ -projection into FE-convex piecewise linear functions (bottom). The 
level curves of the projected function — which is already convex and should be the limit of the discrete 
ones — are straight lines, which are not reproduced by the approximants. The meshes have 256, 1024, 
and 4096 elements. 



4. Approximating functionals. We are now in position to apply finite element 
approximations to a wide class of optimization problems on convex functions. 
Let us describe this technique by assuming, for instance, that the functional 



J{v) 



F{x^ v{x)^gi:didv{x)) dx 



is defined and continuous on H^(Q)^ and we are interested in the optimization problem 

inf{J(v) :veC}, (4.1) 

where C is a family of convex functions, C C H^(yt). 

Using theorem 3.6, it may be not too difficult to define for each /i > a family 
C/i C 14, and a functional Jh defined on C/^, such that: 

1. H^Vh{x) h for ah Vh e Ch and s G ll"^^^, 

2. for any v G C and any e > 0, there exists h > and Vh G Ch such that 

\Mvh)-J{v)\ <€. 

Under the previous conditions, it is easy to prove that (cf. [1]) 



inf {J(v) :veC}=\im inf {Jh{vh) : Vh G C^}. 



h^O 



(4.2) 



As a concrete example, suppose that C is the set of all convex functions in H^{Q) 
with a given mean value, or some prescribed boundary values, / G H^{Q)^ and the 
continuous problem consists in finding u e C such that 

h- fWm =min 11^-/11^1, 
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i.e., minimizing J{v) := \\v — /||^i over C. 

In order to compute an approximation of u we may thus consider Ch as the set 
of discrete functions v^ G Vh with H^v^ h for ah s G ^t^sf which also satisfy the 
integral or boundary constraints. 

Assuming exact integration we can set Jh{vh) = ^('^/i), or otherwise, Jh{vh) may 
result from some fixed quadrature rule on the elements of the mesh. In both cases it 
is easy to see that the previous assumptions hold, and thus the discrete minimizers 
Uh G Ch provide a convergent (sub) sequence to the exact solution u. 

5. Numerical Experiments. 

5.1. Implementation issues. The numerical examples were implemented using 
the finite element toolbox ALBERTA [19] for assembling the optimization problem, and 
CSDP [2] for solving the corresponding semidefinite programs. The experiments were 
run on a desktop PC with a 2.8 GHz Intel Pentium IV processor and 2GB of RAM. 

In our experiments we used Lagrange finite elements of polynomial degree 2 for 
both Vh and Wh, over simplicial meshes, but the right choice might depend on the 
precise problem at hand. 

Regarding the implementation in ALBERTA, we had to introduce some modified 
basis functions when using quadratic finite elements as test functions in Wh- The 
canonical nodal basis functions associated with the vertices of the elements change 
sign, and the theory requires that the test functions be non-negative. To do this, we 
considered the usual piecewise linear nodal basis functions for the vertices, whereas for 
the nodes that correspond to the midpoints of the edges we used the usual quadratic 
bubbles, which are obtained as the product of the two linear basis functions that 
correspond to the vertices of the edge. 

In the initial experiments we observed some oscillations at the boundary, but 
they ceased to appear when we incorporated into the test space the basis functions 
corresponding to the boundary nodes, enlarging Wh so that it is no longer a subset 
of Hq{Q). In this case, formula (2.3) was transformed into 

{{Hu)ij^(p) = {Hu^(p)ij = — / diu{x) dj(f{x)dx -\- / diu{x) (p{x)uj dS, (5.1) 

Jn Jon 

where Uj denotes the j-th component of the outward unit normal to dQ. This slight 
modification still leads to the same theoretical results of the previous sections. We 
decided to present them assuming zero boundary values for the test functions, in order 
to keep the presentation clearer. 

5.2. Statement of the discrete problems. In the examples that follow, we 
always considered the minimization of functionals of the form 

J{u) = / (^a\gT8id{u — vi)\ -\-(3\u — V2\ + 7 • gradii + /ix) dx, 

where a, /?, 7, ^i, ^2, and / are given functions on Q. Appropriate choices of these 
functions lead to functionals whose minima are the L'^{Q)- or the i!f^(l]) -project ion 
of a function, or the solutions to problems (1.2), or (1.3). The approximate functional 
Jh{vh) results from applying some fixed quadrature rule on the elements of the mesh. 
In order to model this problem as a semidefinite program on each given mesh, we 
used a fixed quadrature rule (exact for polynomials of degree < 4) over the elements 
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of the mesh and approximated the functional by 



j=i 



^dXn 



dxj 



-^ (3{xi){u{xi) -V2{xi))'^ +7(^i) •grad2x(xi) -^ f{xi)u{xi 



where x^, i^^, i = 1, 2, . . . , A^ are the quadrature points and weights, respectively. The 
minimization of Jh was then modeled by adding {d -\- 1)N auxiliary variables tij^ Si^ 
i = l,2,...,Ar, j = 1,2, ...,d, as 

N r- d 

minimize ^ Wi a{xi) ^ Uj + P{xi)si + j{xi) • grad u{xi) + f{xi)u{xi) 



i=l 



subject to 



^S^^^^^"^^^^^^'-^^'^' 



{u{Xi) -V2{Xi))'^ < Si 

for i = 1, 2, . . . , A/", and j = 1, 2, . . . , d, plus the convexity constraints. In turn, the 
constraints involving squares are modeled, respectively, by 



du 



1 



'"(^o-i^(^.) 



c^x 



>-0 



and 



1 ii(Xi) - V2(^i) 



^0. 



5.3. Adapt ivity. In order to take full advantage of the flexibility of flnite ele- 
ments, we included some adaptivity into our algorithms, which was implemented as 
a loop of the form [15] 

SOLVE -^ ESTIMATE -^ MARK -^ REFINE. 

The step SOLVE consisted in solving the resulting semi-deflnite programs using CSDP. 
Having computed the discrete solution Uh^ the step estimate consisted in estimating 
the error distribution over the triangulation T^ in the following way: we deflned for 
each T G T^ the quantity r]T = hrj \\[gr8iduh]\\L'^(dT)j where [gradix/i]|5 denotes the 
jump of grad^/^ over the inter-element sides S and is deflned as zero at boundary 
sides. This quantity r]T is the dominating part of the residual- type a posteriori error 
estimator for Poisson's problem, and we used it as a heuristic indicator of the error. 
Further studies are necessary in order to develop rigorous upper and lower bounds 
for the error in this type of problems, an open question which is out of the scope of 
this article; we introduced a heuristic error estimator here just to show the power of 
flnite elements and the great improvement in performance that adaptivity can provide. 
The step mark, consisted in marking (selecting for reflnement) all the elements whose 
indicators satisfled tjt > O.Tr^max, where 77max •= max^^^-^^ tjt- The step refine was 
implemented using the standard routines of ALBERTA, which perform newest- vertex 
bisection, guaranteeing a uniform shape-regularity constant. 
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Fig. 5.1. Contour lines and meshes obtained with the adaptive version of the algorithm. We 
show the meshes of iterations 0, 2, and 4, with 16 (41), I40 (299), and 388 (799), elements (DOFs), 
respectively. 



5.4. Examples. Example 5.1. In this example we apply our algorithm to solve 
the monopolist problem (1.3), for d = 2, / = 1 and c = 0. In this case the exact 
solution is known to be 



u{xi,X2) = max{0,xi — a,X2 — a, xi -\- X2 — b}, 

where a = 2/3 and 6 = (4 — a/2)/3, and allows us to compute the true error. The 
method is applied using quadratic elements both in the trial and test spaces. In 
figure 5.1 we show a sequence of solutions using adaptive meshes, and in table 5.1 we 
can observe the error. In Figure 5.2 we show the final mesh, approximate and exact 
solution, after 6 iterations. In order to illustrate the performance of the adaptive 
method, we also include in table 5.1 the errors and CPU-times obtained with uniform 
meshes. The reported CPU-times correspond to the time taken by CSDP to find 
the minimum of the functional on the given mesh. To be fair in the comparison, we 
should look at the cumulative sum of the CPU-times, since the whole adaptive process 
is necessary in order to arrive at the graded meshes. 

Example 5.2. In this example we apply our algorithm to minimize the functional 
defined in (1.2) over the set of convex functions in {v G H^{Q) • /^'^ = O} (cf. [6]). 
We consider Q = {(xi,X2) G 



^^ : Xi +^2 < 1} and 



f{xi,X2) 



1 if X? + (X2 + 1)2 < 1/4, 
-1 ifx2 + (x2-l)^ <l/4, 
otherwise. 




In figure 5.3 we show the outcome of our method using adaptivity. We can observe 
that the solution tries to satisfy Au = / in places where / > 0, and it continues to 
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Fig. 5.2. Final mesh, approximate and exact solution, after 6 iterations. The mesh has 802 
elements and 164-1 degrees of freedom were used with quadratic elements. The code CSDP took 211 
seconds to minimize the functional, and the L'^ -error between the approximate and exact solution is 
0.00192. To obtain a similar error with uniform meshes, more than 4096 elements and 8321 DOFs 
are needed, which forces CSDP to work more than 7000 seconds (see table 5.1). 



Elements 


DOFs 


CPU-time 


L^-error 


L°° -error 


16 


41 


0.00 


0.06371 


0.11168 


54 


123 


1.00 


0.01597 


0.06214 


140 


299 


11.00 


0.00935 


0.02589 


248 


519 


25.00 


0.01075 


0.01553 


388 


799 


59.00 


0.01023 


0.01633 


542 


1117 


94.00 


0.01085 


0.01529 


802 


1641 


211.00 


0.00192 


0.00846 



(a) 



Elements 


DOFs 


CPU-time 


i^-error 


L~-error 


16 


41 


1 


0.06371 


0.11168 


64 


145 


2 


0.01644 


0.06250 


256 


545 


22 


0.01202 


0.02347 


1024 


2113 


343 


0.01091 


0.01780 


4096 


8321 


7211 


0.01270 


0.03508 



(b) 

Table 5.1 
Errors and CPU-time (in seconds) for the adaptive (a) and uniform-refinement (b) solutions 
of the monopolist problem. 



be convex outside that region, minimizing \Au — /|. In the lower part of the domain 
(around the point (0, —1)), the solution u satisfies Au = f and on the boundary, the 
natural homogeneous Neumann boundary conditions du/dv = are satisfied. This 
can be seen by the fact that the level curves are perpendicular to the boundary in that 
part of d^. In the upper part of the domain the solution is just linear (ruled surface), 
which is a consequence of the fact that the Laplacian of u has to be non-negative and 
as close to —1 as possible, keeping u convex over the whole domain. 

The adaptive method correctly captures the region where u is fiat, representing 
the solution with a minimal number of elements. 
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Fig. 5.3. Minimizer of the functional (1.2). Top: initial and final mesh, and contour curves 
of the solution. Bottom: surface plot of the solution as viewed from (0, —10, 0) (left), (—10, —10, 0) 
(middle), and ( — 10,0,0) (right). 



6. Conclusions. We have proposed a novel way of imposing convexity on finite 
element functions, and proved that this new definition solves the two issues necessary 
for the approximation of optimization problems over convex functions: 

• Every convex function can be approximated; and 

• If a sequence of FE-convex functions is convergent, then the limit is convex. 
Our definition takes advantage of the existence of efficient codes for solving 

semidefinite programs, and uses a new definition of discrete Hessians, which is based 
upon a weak Hessian for H^ functions. One interesting aspect is that the definition is 
intrinsic, i.e., it only uses the values of the discrete functions at the nodes, and locals 
leading to a set of constraints with cardinality of order 0(iV), N being the number of 
vertices or nodes of the mesh. Furthermore, it is very simple to program in any space 
dimension. 

Another interesting — and puzzling — issue is the fact that, in general, except for 
some particular very regular meshes, the discrete functions need to have an approx- 
imation order higher than the one provided by linears. Our proof requires this as- 
sumption, and we found numerical evidence that this is necessary, but a better expla- 
nation/understanding of this issue is still pending. 

Numerical experiments show a competitive performance, specially through the use 
of adaptivity, which, in turn, is easy to implement for finite elements. Our preliminary 
computations using a heuristic error indicator are promising, but a lot needs to be 
done in this direction, namely, to find a posteriori error indicators which are reliable 
and efficient, and once this is established, to prove convergence, and optimality. These 
are difficult open questions that will be subject of future research. 
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